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It is well-known that biological and social interaction networks have a varying degree of redun- 
dancy, though a consensus of the precise cause of this is so far lacking. In this paper, we introduce a 
topological redundancy measure for labeled directed networks that is formal, computationally effi- 
cient and applicable to a variety of directed networks such as cellular signaling, metabolic and social 
interaction networks. We demonstrate the computational efficiency of our measure by computing 
its value and statistical significance on a number of biological and social networks with up to several 
thousands of nodes and edges. Our results suggest a number of interesting observations: (1) social 
networks are more redundant that their biological counterparts, (2) transcriptional networks are 
less redundant than signaling networks, (3) the topological redundancy of the C. elegans metabolic 
network is largely due to its inclusion of currency metabolites, and (4) the redundancy of signaling 
networks is highly (negatively) correlated with the monotonicity of their dynamics. 
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I. INTRODUCTION 

The concepts of degeneracy and redundancy are well 
known in information theory. Loosely speaking, degen- 
eracy refers to structurally different elements performing 
the same function, whereas redundancy refers to identi- 
cal elements performing the same function^. In electronic 
systems, such measures are useful in analyzing properties 
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1 We remind the reader that the term "redundancy" is also used 
in other contexts in biology unrelated to the definition of redun- 
dancy in this paper. For example, some researchers use redun- 
dancy to refer to paralonous genes that can provide functional 
backup for one another [l| . In addition, some researchers use the 



such as fault-tolerance. It is an accepted fact that biolog- 
ical networks do not necessarily have the lowest possible 
degeneracy or redundancy; for example, the connectivity 
of neurons in brains suggest a high degree of degener- 
acy 0- However, as Tononi, Sporns and Edelman ob- 
served in their paper Q: 

Although many similar examples exist in all 
fields and levels of biology, a specific notion of 
degeneracy has yet to be firmly incorporated 
into biological thinking, largely because of the 
lack of a formal theoretical framework. 

The same comment holds true about redundancy as well. 
A further reason for the lack of incorporation of these no- 
tions in biological thinking is the lack of effective algorith- 
mic procedures for computing these measures for large- 
scale networks even when formal definitions are available. 



two terms, redundancy and degeneracy, interchangeably or use 
other terminologies for these concepts. 
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Therefore, such studies are often done in a somewhat 
ad- hoc fashion, as in Q. There do exist notions of "re- 
dundancy" in the field of analysis of undirected networks 
based on clustering coefficients (e.g., see or between- 
ness centrality measures (e.g., see @). However, such 
notions are not appropriate for the analysis of biologi- 
cal networks where one must distinguish positive from 
negative regulatory interactions, and where the study of 
dynamics is of interest. 

II. BRIEF REVIEW OF AN 
INFORMATION-THEORETIC DEGENERACY 
AND REDUNDANCY MEASURES 

Formal information-theoretic definitions of degeneracy 
and redundancy for dynamic biological systems were pro- 
posed in [H (see also 0,9) based on mutual-information 
contents. These definitions assume access to suitable per- 
turbation experiments and corresponding accurate mea- 
surements of the relevant parameters. Thus, they are not 
directly comparable to the topology-based redundancy 
measures that we propose in this paper. Nonetheless, we 
next briefly review these definitions as a way to illustrate 
some key points of other measures often used in the lit- 
erature that motivated us to define our new redundancy 
measure. 

The authors of [|[ consider system consisting of n ele- 
ments that produces a set of outputs O via a fixed con- 
nectivity matrix from a subset of these elements. The 
elements are described by a jointly distributed random 
vector X that represents steady-state activities of the 
components of their system. The degeneracy T>(X ; O) of 
the system is then expressed as the average mutual infor- 
mation (Ml) shared between O and the "perturbed" bi- 
partitions of X summed over all bipartition sizes (Equa- 
tion [2b] of Q), i.e., 

, n 

2?(X;0) = -x^^(Ml p (^ fe ;0) 

fe=l j 

+ M\ p (X\x!f; O) - M\ P (X; 0)) (1) 

where Xj is a j th subset of X composed of k elements 

and the notation Ml f '(^4; O) denotes the mutual infor- 
mation between a subset of elements A and an output 
set O, when A is injected with a small fixed amount of 
uncorrelated nois <3; see i for details. One can im- 
mediately see a computational difficulty in applying such 
a definition: the number of possible bipartitions could be 
astronomically large even for a modest size network. For 



2 MI P (.A; O) = H(A) + H{0) - H{A, O), where H(A) and H(0) 
are the entropies of A and O considered independently, and 
H(A C) ls the joint entropy of the subset of elements A and 
the output set O. 



example, for a network with 100 nodes which is a number 
smaller than all but one of the networks considered in this 
paper, the number of bi-partitions is roughly 2 100 > 10 30 . 
Measures avoiding averaging over all bi-partitions were 
also proposed in |3[ , but the computational complexities 
and accuracies of these measures remain to be thoroughly 
investigated and evaluated on larger networks. 

In a similar manner, the redundancy R(X; O) of a 
system X was defined in Q as the difference between 
summed mutual information upon perturbation between 
all subsets of size up to 1 and O, and the mutual infor- 
mation between the entire system and O (Equation [3] 
in (U), i.e., 

n 

R(X; 0) = ^2M\ P (X}; O) - M\ p (X ; O) (2) 

i=i 

Note that a clear shortcoming of this measure is that 
it only provides a number, but does not indicate which 
subset of elements are redundant. Identifying redundant 
elements is important for the interpretation of results, 
and may also serve as an important step of the network 
construction and refinement process, as we will illustrate 
in our application to the C. elegans metabolic network 
and the oriented PPI network. Tononi, Sporns and Edel- 
man Q illustrated the above measure on a few model 
networks as a proof of concept, but large networks clearly 
necessitate alternate measures that allow efficient calcu- 
lations. 

In this paper we propose a new topological measure of 
redundancy. A benefit of our new redundancy measure 
is that we can actually find an approximately minimal 
network and, in the case of multiple minimal networks 
of similar quality, a subset of them by enabling a ran- 
domization step in the algorithmic procedure. We deter- 
mine this redundancy value for a number of biological 
and social networks of large sizes and observe a number 
of interesting properties of our redundancy measure. 

III. MODELS FOR DIRECTED BIOLOGICAL 
AND SOCIAL NETWORKS 

There are two very different levels of models for bio- 
logical systems. A so-called network topology model (also 
known as a "wiring diagram" or a "static graph") pro- 
vides a coarse diagram or map of the physical, chemical, 
or statistical connections between molecular components 
of the network, without specifying the detailed kinetics. 
In this type of model, a network of molecular interactions 
is viewed as a graph: cellular components are nodes in a 
network, and the interactions between these components 
are represented by edges connecting the nodes. In this 
paper, we are mainly concerned with this type of model; 
exact details are described in Section ITlI Al 

In the other type of model, a network dynamics model, 
mathematical rules (e.g., systems of Boolean rules or dif- 
ferential equations) arc used to specify the behavior over 
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time of each of the molecular components in the net- 
work. Our investigation is not directly concerned with 
such dynamic models. However, since we will show a 
correlation of our redundancy measure for the network 
topology model with a property, namely monotonicity, 
of an associated network dynamics model, we briefly re- 
view this model in Section IIIIBI 



A. Network Topology Model 




FIG. 1. The network topology model for biological networks. 
The parity of the pathway B->C-> AHD islx ( — 1) x 
(-1) = 1. 

Three common types of molecular biological networks 
arc: transcriptional regulatory networks, metabolic net- 
works, and signaling networks. The nodes of transcrip- 
tional regulatory networks represent genes, and edges 
represent (positive or negative) regulation of a given 
gene's expression by proteins associated to other genes. 
The nodes of metabolic networks are metabolites and the 
edges represent the enzyme- catalyzed reactions in which 
these metabolites participate as reactants or products. 
The nodes of signaling networks are proteins and small 
molecules, and the edges represent physical or chemical 
interactions or indirect positive or negative causal ef- 
fects. A unified formalism to describe all these types 
of networks uses a directed graph G = (V, E, w) with 
vertex set V, edge set E, and an edge labeling function 
w : E i-> { — 1,+!} in which a label of 1 (respectively, 
— 1) represents an positive (respectively, negative) influ- 
ence. A pathway is then a path P from vertex u to vertex 
v, and the excitory or inhibitory nature of the pathway 
is specified by the parity Ii ee pw(e) G { — !,+!} of such 
a path P; see Fig. [1] for an illustration. 

Our model for directed social interaction networks is 
simply a directed graph in which edges represent signif- 
icant relationships between the entities, e.g., nodes may 
represent web-pages and directed edges may represent 
hyper-links of one web-page in another. Obviously, we 
can think of such a model as one of the above type in 
which all edges are labeled +1 (and, thus all paths have 
the same parity); this allows us to treat both social and 
biological networks in a mathematically uniform manner 
for the purpose of designing and analyzing algorithms. 



B. Network Dynamics and Monotonicity 

Consider systems modeled via ordinary differential 
equations: 

dx(t) 

—^=f i (x 1 (t),x 2 (t),...,x n (t)) fori=l,2,...,n (3) 

where Xi(t) indicates the concentration of the i th entity 
in the model at time t and the fi's are functions of n 
variables. We assume that x(t) = (xi(t),X2(t), . . . , x n {t)) 
evolves in an open subset of K™, the fi's are diffcrcntiable, 
and solutions are defined for t > 0. For example, a simple 
two species interaction could be described by 

^(t) =3 Xl (t)-5x 2 (t) 

s^(t) =x 1 (t) + x 2 (t). 

A particularly appealing class of dynamics is that of 
monotone systems 0, [lC| ■ Informally, the dynamics of 
a monotone system preserves a specific partial order (hi- 
erarchy) of its inputs over time. Mathematically, mono- 
tonicity can be defined as follows. 

Definition 1 [t| |T(| Given a partial order ^ over K™, 
system (J3|) is said to be monotone with respect to ^ if 

Vi>0: (x 1 (0),...,x„(0)) d (vi(0),...,y„(0)) 

=>(a:i (*),..., !„(*)) 1 (yi{t),---,y n (t)) 

where (xi(t), ... ,x n (t)) and (yi(t) , . . . , y n {t)) are the so- 
lutions of ([3]) with initial conditions (xi(0), . . . , x n (0)) 
and (yi(0), ...,y n (0)), respectively. 

We will restrict our attention to orthant orders. These 
are the partial orders ;< s over R™, for any given s = 
(si, . . . s n ) £ {-1, 1}™, defined as (see [lM|l): 

x ^ s y^=^yi: s i x l < s l y l 

In particular, the "cooperative order" is the partial order 
^ s far a = (1,1,..., 1). 

Monotone systems constitute a nicely behaved class 
of dynamical systems in several ways. For example, for 
these systems pathological behaviors (chaotic attractors) 
are ruled out. Even though they may have an arbitrarily 
large dimensionality, monotone systems (under an addi- 
tional irreducibility assumption) behave in many ways 
like one-dimensional systems; for example, bounded tra- 
jectories generically converge to steady states, and stable 
oscillatory behaviors do not exist. Monotonicity with re- 
spect to orthant orders is equivalent to the non-existence 
of negative loops in systems; analyzing the behaviors of 
such loops is a long-standing topic in biology in the con- 
text of regulation, metabolism and development , starting 
from the work of Monod and Jacob in 1961 ^M,. In this 
paper, we will define a measure of "degree of monotonic- 
ity" for dynamical systems and relate it to our topology- 
based redundancy measure. 
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IV. A NEW MEASURE OF REDUNDANCY 

We will use the following notations for conciseness: 

• For any two vertices u and v, u =S> v (respectively, 
u A v) denotes a path (respectively, an edge) from 
u to v of parity x. We include the empty path 
u u for each vertex u. 

• For any E' C E, reachable(iJ') is the set of all or- 
dered triples (u, v, x) such that u =4> v exists in the 
subgraph (V, E'). 

For example, for the network in Fig. U B 4 D 
exists because of the path B H A H D and 
also because of the path B —> C H A H 
D , and reachable ({ B -> C , A H D }) 
{(A, A,1),(B, B,1),(C, C,1),(D, D,l), 
(B, C,1),(A, D,-l),}. 

We next state a combinatorial optimization problem 
that will be needed in order to introduce our new redun- 
dancy measure. 

Problem name:: Binary Transitive Reduction 
(BTR). 

Instance:: a directed graph G = (V,E) with a 
subset of edges Efi xo d C E and an edge la- 
beling function w : E M> { — 1, 1}. 

Valid Solution:: a subgraph G' — (V,E') such 
that 

• E' D E fixod and 

• reachable(_E') =reachable(-E). 

(E \ E' is referred to as a set of "redundant" 
edges.) 

Goal:: minimize \E'\. 




FIG. 2. Choosing one wrong edge may cost too much in BTR. 

Intuitively, the BTR problem prunes pathways for 
which alternate equivalent pathways exist (e.g., see 0, 
HU). The set of edges in Efi xe d in the definition of BTR 
represents edges that may not be removed during the al- 
gorithm; this is useful in the context when one wishes 
to reduce a network while preserving specific pathways. 



For the redundancy calculations performed in this pa- 
per, we assume no prior knowledge of direct interactions; 
thus for the rest of the paper we set Efi xe d = 0- As 
an illustration, in Fig. [T] if we let E' = E \ { B H A } 
then reachable(-E') =reachable(i?) because of the path 
B^CHA. 

Finding a maximum set of edges that can be removed 
is non-trivial; in fact, the problem is NP-hard (l6| . To 
illustrate the algorithmic difficulties, consider the net- 
work shown in Fig. [5J Removal of all the black edges 
provides a non-optimal solution of BTR, whereas an op- 
timal solution with about half the edges compared to the 
non-optimal solution can be obtained by keeping all the 
black edges and removing all but two of the gray edges. 
The special case of BTR with Efi xe d = and w(e) = 1 for 
all edges e is the so-called classical minimum equivalent 
digraph problem, and it has been investigated extensively 
in the context of checking minimality of connectivity re- 
quirements in computer networks (e.g., see [IH). Other 
examples of applications of BTR- type network optimiza- 
tions include the work by Wagner |17f employing a special 
case of BTR to determine network structure from gene 
perturbation data in the context of biological networks 
and the work by Dubois and Cecile [HI in the context of 
social network analysis and visualization. 

Based on the BTR problem, we propose a new com- 
binatorial measure of redundancy that can be computed 
efficiently. Note that BTR does not change pathway level 
information of the network and removes edges from one 
node to another only when a similar alternate pathway 
exists, thus truly removing redundant connections. Thus, 
provides a measure of global compressibility of the 
network and our proposed new redundancy measure R nG w 
is defined to be 

\E'\ 

Rnew — 1 j-gj" (4) 

The \E\ term in the denominator of the above defini- 
tion translates to a "min-max normalization" of the mea- 
sure [l9|, and ensures that <R now < 1- Note that the 
higher the value of R ncw is, the more redundant the net- 
work is. 



A. Properties of Our Topological Redundancy 
Measure and Applications of a Minimal Network 

Any topological redundancy measure should have a de- 
sirable property: the measure must not only reflect simple 
connectivity properties such as degree- sequence or average 
degree, it must also depend on higher-order connectivity. 
Our redundancy measure indeed has this property, since 
paths of arbitrary length are considered for removal of 
an edge. For a concrete example, consider two graphs 
shown in Fig. [31 the in-degree and out-degree sequence 
of each graph is 1, 1, . . . , 1, 1, 2, 2, . . . , 2, but their redun- 

f+i _ f-i 
dancy values are drastically different. Similarly, higher 
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average degree does not necessarily imply higher values 
of redundancy; for example, the network in Fig. [3j when 
generalized on n nodes, has an average degree below 2 
and a redundancy value of roughly 0.33, whereas the 
graph K™ . ™ (a completed bipartite graph with each par- 
tition having n/2 nodes and all edges directed from the 
left to the right partition) has an average degree of n/2 
but a redundancy value of 0. 




FIG. 3. Two n-node graphs with same degree sequence but 
with different values of R nC w, shown for n = 8. The top graph 
has no redundant edges, thus for it R nC w = 0. The dashed 
edges for the bottom graph can be removed, giving R nC w = fj . 



B. Computing R nC w 

Although solving BTR exactly is an NP-hard prob- 
lem, it has a rich combinatorial structure that al- 
lowed us to design an efficient approximation algo- 
rithm. The resulting algorithms were incorporated in 
our NET-SYNT HESIS s oftware [H| (publicly availabl e at 
www. cs .uic . edu/~dasgupta/network-synthesis/). 

Although it is impossible to provide all details of 
the algorithmic approaches that was used for NET- 
SYNTHESIS, we provide some high-level details of the 
algorithm used; the reader can find further details, cor- 
rectness proofs and algorithmic analysis in [lj, [20| . It 
was proved in (20j that any strongly connected compo- 
nent (SCC) of the given graph G = (V, E), say (Vi.-Ei) 
with Vi C V and E\ = (Vi x V\) (~l E, can be classified as 
one of the two types: a single parity SCC if, for any two 
vertices u,v € Vi, u v exists in the SCC for exactly 
one x from { — 1,1}, and a multiple parity SCC if, for any 
two vertices u, v S Vi, u =4> v exists in the SCC for both 
x = 1 and x = — 1. A high-level view of the algorithmic 
approach is shown in Fig. [5] 

The running time of NET-SYNTHESIS is dominated 
by Step 2. Theoretically, the worst-case running time of 
the algorithm is 0(n 3 ) when n is the number of ver- 
tices in G, but empirically the implementation allows 
us to calculate R ncw for networks up to about five to 



ten thousand nodes, thereby allowing us to compute the 
redundancy parameter for large networks. We expect 
that a future improved implementation of BTR will allow 
the calculation of redundancy values for even larger net- 
works. Regarding optimality of the computed solution, 
theoretically NET-SYNTHESIS returns a solution that is 
a 3-approximation [l4j . i.e. |-E S oiution| is no more than 
three times of that in an optimal solution in the worst 
case. However, extensive empirical evaluations reported 
in [Til ] suggest that in practice | -Evolution | is almost always 
close to optimal (within an extra 10% of the optimal). 



C. Illustration of Redundancy Calculation for a 
Small Biological Networks 

Our results of redundancy calculations on large-size 
biological and social networks are reported later, in Sec- 
tion [VIll but here we illustrate the redundancy and min- 
imal network calculations on a biological network that 
arises from the repetition of a fixed gene regulatory net- 
work over a number of cells. This gene regulatory net- 
work is formed among products of the segment polarity 
gene family, which plays an important role in the em- 
bryonic development of Drosophila melanogaster. The 
interactions incorporated in this network include trans- 
lation (protein production from mRNA), transcriptional 
regulation, and protein-protein interactions. Two of the 
interactions are inter-cellular: specifically, the proteins 
wingless and hedgehog can leave the cell they are pro- 
duced in and can interact with receptor proteins in the 
membrane of neighboring cells. We select this network 
for several reasons. First, the core part of the network for 
a single cell is small, consisting of 13 nodes and 22 edges, 
which enables analytical calculations of redundancy and 
visual depiction of redundant edges. Secondly, in spite 
of its simplicity and regularity, the associated multi-cell 
network does exhibit non-trivial redundancies due to the 
inter-cellular interactions and the cyclic arrangement of 
cells. The network for a single cell was first published 
in [2l| and later in slightly modified form in [22|, [23| . 
Fig. E](a) shows the network of [2l[ with the interpre- 
tation of the regulatory role of PTC_m on the reaction 
CI -> CN as PTC_m -> CN and PTC_m H CI. We note that 
the inter-cellular interactions are present at the whole 
cell membrane and not just the right boundary as shown 
for simplicity in all reconstructions. In a manner sim- 
ilar to that in other papers {e.g., see [il|). we build a 
1-dimensional multi-cellular version by considering a row 
of y cells, each of which has separate variables for each of 
the compounds, letting the cell-to-cell interactions be as 
in Fig.[5](a), but acting on both left and right neighbors, 
and using cyclic boundary conditions; see Fig. 0(b) for 
an illustration. 

If the network contains y > 2 cells, then 

• the number of vertices and edges are 13y and 22y, 
respectively; and 
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1. Partition G into SCCs, say d = (V u E{), C 2 = (V 2 ,E 2 ), ...,C P = (V p , E p ) 

2. An SCC is a single parity component if, for every pair of nodes u and v, both u => v and u 4> v do 
not exist in the SCC; otherwise it is a multiple parity component. Classify each SCC as single or multiple 
parity via a dynamic programming algorithm. 

3. for each strongly connected component d do 

Use a heuristic to compute a solution, say E[, of BTR for G. 

The heuristic repeatedly selects an edge u A v that can be removed until no such edges exist in the 

SCC. 

Several criteria are used to select such as: 

• parity of d (computed in Step 2) 

• length of the alternate path u => v 

• size (number of nodes) of C» 
endfor 

4. Build the following directed acyclic graph Gs — (Vs,Es) from G. At the end of the transformation, 
every edge e of G will be replaced by at most four edges in Gs', we say that these (at most four) edges are 
"generated" by e. The proof of correctness of the algorithm shows that, for each edge e, all or none of the 
edges generated by e will be in the computed solution of Gs in Step 5. 

for i = 1, 2, . . . ,p do 
if Ci is of multiple parity then 
replace Ci by a node yi 

if there is a directed edge (u,v) with u ^ d and v G d then add the two edges u A yt and 

u A yi 

if there is a directed edge (u,v) with u G Ci and v d then add the two edges y t A v and 

Vi A v 

endif 

if Ci is of single parity then 

pick any vertex v G Ci] let 7+ = {x G Cj | v =4- a; exists in Ci}, and 7~ = {x G 
Ci | v a; exists in Ci} 

replace Ci by four nodes y+ , y++ , yr , y r~ , an d four edges y+ 4 y++ , y+ Ay", y," A y+ + , y,~ A 

yf~ 

for every edge u — > v with u £ Ci and v G Ci do 

if v G 7+ then add the two edges u A y 4 + and u A j/r 

if f G 7 _ then add the two edges u A yf and it A y 4 ~ 
endfor 

for every edge u A v with tieQ and w G do 

if D G 7 + then add the two edges y 4 ++ A w and y" A w 

if v G 7~ then add the two edges yf + A v and y" Aw 
endfor 
endif 
endfor 

5. Solve BTR for Gs optimally by a greedy approach; let E' s C 7?s be this solution. 

6. Our solution 75 so i ut i n of BTR for G is as follows: 

Include all the edges in (uf =1 7^') in 75 so i ut i on 
for every edge e of G do 

if the set of edges generated by e is in E' s then include e in T? so i u ti on 
endfor 



FIG. 4. A high-level view of the algorithmic approach in NET-SYNTHESIS to perform BTR. 
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FIG. 5. (a) The Drosophila segment polarity network for a single cell, redrawn from (b) A network of 4 cells. The 

redundant edges in each cell are colored light gray. The dark gray edges form an alternate pathway of same parity for the edge 
WGi — 5-wgi. 



• NET-SYNTHESIS, after performing BTR, keeps 
lQy - 2 edges, giving R new = w 

Identifying a molecule in the i th cell via a subscript i, 
NET-SYNTHESIS removed the following edges: 

• the two edges WG_m2 — >erti and WG_mi — s-er^, 
and 

• the set of six edges from each cell i : PTC_m.; — > 
PH.m,, PTC_m 4 H Clj, WG. t ^w gl , CN, Hen.,, 
Cli -» wgi and CI; -> ptQ 

As can be seen, the redundancies depend in a non-trivial 
manner on higher-order connections. For example, the 



light gray edge \NGi — >w gl is redundant because of the 
alternate dark gray pathway shown in Fig. [5] 



D. Computing the Confidence Parameter for R n ow 

We apply our redundancy measure on seven bi- 
ological networks and four social networks (see Ta- 
ble [I]). For each (social or biological) network G in Ta- 
ble [l] except networks (9) and (10), having a redun- 
dancy value of R new (G), we generated 100 random net- 
works, and computed the redundancies R ne w(G ran domi )> 

R nGW (G 

random.2 ) ; ■ • ■ j-^-new (G^randomioo ) ^ tllGSC random 
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networks. We then use a (unpaired) one-sample stu- 
dent's t-test to determine the probability that R new (G) 
can be generated by a distribution that fits the data 

points Rncw (Crandomi ) , . . . , Rncw (G ran domioo ) " 

The current implementation of NET-SYNTHESIS 
runs slowly due to its intensive disk access on networks 

(9) and (10) in Table U because network (9) is very dense 
(an average degree of 9.62 on 1133 nodes) and network 

(10) has a very large number of edges (24316 edges). Re- 
dundancy analysis of a single random graph generated 
for either of these two networks requires a week or more, 
and any meaningful statistics would require on the or- 
der of 100 random graphs for each network. Due to the 
prohibitive time requirements we were not able to report 
p- values for these two networks Since the characteristics 
of various biological and social networks are of different 
nature, we generate random networks for the various net- 
works using two different methods as explained below. 

Ideally, for networks of a particular type, one would 
prefer to use an accurate generative null model for highest 
accuracy in p- values. For signaling and transcriptional 
biological networks (networks (1) — (5) in Table [TJ) , ref- 
erence [l4j . based on extensive literature review of simi- 
lar kind of biological networks in prior papers, arrived at 
the characteristics of a generative null model that is de- 
scribed below and used by us for these network^. One of 
the most frequently reported topological characteristics 
of such networks is the distribution of in-degrees and out- 
degrees of nodes, which exhibit a degree distribution that 
is close to a power-law or a mixture of a power law and an 
exponential distribution (2514271 ]. Specifically, transcrip- 
tional regulatory networks have been reported to exhibit 
a power-law out-degree distribution, while the in-degree 
distribution is more restricted [lH, H^. Based on such 
topological characterizations of signaling and transcrip- 
tional networks reported in the literature, [14j used the 



following degree distributions for the purpose of gener- 
ating random networks for the biological transcriptional 
and signaling networks such as the ones in (l)-(5) in 
Table i 

• The number of vertices is the same as the network 
G whose redundancy value was computed. 

• The in-degree and out-degree distributions of the 
random networks are as follows: 

— The distribution of in-degree of the networks 
is exponential, i.e., Pr[in-degree=x]= c\ e~ cx 
with i < ci < i and a maximum in-degree of 
12. 

— The distribution of out-degree of the networks 
is governed by a power-law, i.e., for x > 1, 



Pr[out-degree=a;] = c 2 x~ c , for x = Pr[out- 
degree= 0]> Ci with 2 < < 3 and a maxi- 
mum out-degree of 200. 

— The parameters in the above distribution are 
adjusted such that the sum of in-degrees of all 
vertices are equal to the sum of out-degrees of 
all vertices and the expected number of edges 
is the same as G. 

• The percentage for activation/inhibition edges in 
the random network is the same as in G. 

Each of the r random networks with these degree distri- 
butions are generated using our private implementation 
of the method suggested by Newman, Strogatz and Watts 
in 

For social networks, for the C. elegans metabolic net- 
work and for the oriented PPI network (networks (6) 
(11) in Table |l|, in the absence of a consensus on an ac- 
curate generative null model, wc generated the r random 
networks using a Markov-chain algorithm [30j in a simi- 
lar manner as in, say [24j |. by starting with the real net- 
work G and repeatedly swapping randomly chosen pairs 
of connections in the following manneiU 



> b and c — >• d, 

ye {-1,1}) 



repeat 

choose two edges of G = (V,E), a - 
randomly and uniformly (x 
if x ^ y or a = c or b = d 

or d e E or c^b e E 
then discard this pair of edges 
else the random network contains the edges 
a A d and c A b instead of a A 6 and c 
until 20% of edges of G has been swapped 



V. MEASURE OF MONOTONICITY FOR 
BIOLOGICAL NETWORKS 

To explain the intuition behind the computation of a 
monotonicity measure of the dynamics of a biological sys- 
tem, we start by relating the time-dynamics of the sys- 
tem with the graph-theoretic model of the network in 
the following way The time-varying system as 

defined by Equation (J3| defines a labeled-graph model 
G = (V, E, w) of the biological network in the following 
manner: 

• V = {xi, . . . ,x n }; 

• if > for all x(t) = (xi(t), x 2 (t), . . . , x n {t)) and 

> for some x(t), 

then (xi,Xj) € E and w(xi,Xj) = 1; 



3 Our simulations with the alternate Markov-chain model used for 
the remaining networks show that the p- values still remain negli- 
gibly small; this is consistent with similar observations in another 
context made by Shen-Orr et al. \2i . 



Shen-Orr et al. |24| considers swapping about 25% of the edges. 
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• if < for all x(t) and gjk < for some x(t), 
then (xi,Xj) G £ and w{xi,Xj) = —1. 

(we assume that, for each i and j, either > for all 



or ^r 2 - < for all x.) 

OXi — ' 





FIG. 6. Network for the system in Equation ([5]). 

As an example, consider the following biological model 
of testosterone dynamics [HI, HU : 



Axi 

dx 2 
~dt 



(*) 



A 



K + x 3 (t) 



hxi(t) 



(t) = cixi(t) - b 2 x 2 (t) 



(5) 



—rr{t) = c 2 x 2 {t) - b 3 x 3 (t) 
at 

The corresponding labeled network for this system is 
shown in Fig. [6j It is easy to show that ([5]) is not mono- 
tone with respect to < s , for all possible s. On the other 
hand, if we remove the term involving x 3 in the first 
equation, we obtain a system that is monotone with re- 
spect to ^< S) s = (1, 1, 1). A cause of non-monotonicity 
of the system is the existence of sign-inconsistent paths 
between two nodes in an undirected version of the net- 
work, i.e., the existence of both an activation and an 
inhibitory path between two nodes when the directions 
of the edges are ignored. To be precise, define a closed 
undirected chain in the labeled graph G as a sequence 
of vertices x^ , . . . , Xi T such that Xi t = Xi T , and such 
that for every A = 1, ...,r — 1 cither (x ix , x ix+1 ) S E 
or (xi x+1 ,xu ) € E. Then, the following result holds [ll| 
(see also [H| and [HI, page 101]). 

Lemma 2 [ill ] Consider a dynamical system ([3]) with 
associated directed labeled graph G. Then, ([3]) is mono- 
tone with respect to some orthant order if and only if 
all closed undirected chains of G have parity 1 . 

Note that the combinatorial characterization of mono- 
tonicity in Lemma [5] is via the absence of undirected 
closed chains of parity 1. Thus, in particular, any mono- 
tone system has 

(a) : no negative feedback loops, and 

(b) : no incoherent feed- forward- loops. 

However, some systems may not be monotone even if (a) 
and (b) hold; see Fig. [7] for an example. 

Lemma [H leads in a natural manner to the following 
sign consistency (SC) problem to determine how mono- 
tone a system is [111 . |35| ■ 



FIG. 7. A non-monotone system with no negative feedback 
loops and no incoherent feed-forward loops. 



Problem name:: Sign Consistency (SC). 

Instance:: a directed graph G = (V,E) with an 
edge labeling function w: E ^ { — 1,1}. 

Valid Solution:: a vertex labeling function 
L:V^{-1,1}. 

Goal:: maximize \F\ where F = {(u, v) \ w(u, v) = 
L(u)L(v)} is a set of "consistent" edges. 

Similar to our redundancy measure, we define the degree 
of monotonicity of a network to be 



M 



\E\ 



(G) 



where F is the set of consistent edges in an optimal so- 
lution. The \E\ term in the denominator of the above 
definition translates to a min-max normalization of the 
measure, and ensures that < M < 1. Note that the 
higher the value of M is the more monotone the network 
is (cf. 0,111). 



A. Computing M 

In [ll[ a semidefinite-programming (SDP) based 
approximation algorithm is described for SC that 
has a worst-case theoretical guarantee of return- 
ing at least about 88% of the maximum number of 
edges. The algorithm was implemented in MAT- 
LAB (the MATLAB codes are publicly available at 
|www . math . rutgers . edu/~sontag/ desz~_ README . html) . 
Other algorithmic implementations of the SC problems 
are described in 1351. l36i . 



B. Computing Correlation Between M and R n 



After obtaining the ordered pair of six values 
(Mi,R n0Wl ), . . . , (M 6 ,R neW6 ) of M and R now for the first 
six networks in Table U we computed the standard 
Pearson product moment correlation coefficient r = 



Rnew) ( f 



1) 



where R n 



Ei=l( R n 



M) 



w 4 -Rncw) 2 E( M i 

— eLi m * 

and M = are the average redun- 

6 6 
dancy and monotonicity values, respectively. The possi- 
ble values of r always lie in the range [—1,1], and values -1 
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and 1 signify strongest negative and positive correlations, 
respectively. A p- value for this correlation was calculated 
by a T-test with two-tailed distribution and unequal vari- 
ance to show the probability of getting a correlation as 
large as the observed value by random chance when the 
true correlation is zero. 



Cellular organisms 

I 1 

Eukarvota Bacteria 

I ' 1 I 

Metazoa Fungi E. coli 



Coelomata Pseudocoelomata S. cerevisiae 



Deuterostomia Protostomia C. elegans 
Mammalia Drosophila 

FIG. 8. An unweighted species tree of the organisms for our 
biological networks, constructed using the Taxonomy Browser 
resources of NCBI [37]] • The tree is not drawn to scale. 



VI. NETWORK DATA 

We selected a total of 11 networks, seven biological 
ones and four social ones. We selected these networks 
with the following criteria in mind: 

• The biological networks were selected with an eye 
towards covering a diverse set of species on the evo- 
lutionary scale and towards covering networks of 
diverse natures (e.g., metabolic, transcriptional); a 
species tree of the biological organisms for our net- 
works is shown in Fig. [8] 

• The social networks were selected covering interac- 
tions in different social environments. 

• The networks span a wide range on size (number of 
edges ranging from 135 to 24316) and density (aver- 
age degree ranging from 1.3 to 13.4) to demonstrate 
that our new redundancy measure can be computed 
efficiently for a large class of networks. 

Table U provides more details and sources for these net- 
works. 



VII. RESULTS AND DISCUSSIONS 

In Table ITll we show the tabulation of redundancy and, 
when appropriate, also monotonicity values for our net- 
works. Because of their large sizes, p- values for the redun- 
dancy measure could not be estimated very reliably for 
networks (9) and (10) since they require runs on many 
random networks, each of which would take upwards of a 



week; thus we do not report p-values for these networks. 
The extremely low p- values in Table [TT] indicate that the 
real networks' redundancy values cannot be generated by 
a distribution that fits the redundancies of the equivalent 
random graphs. 

If one prefers, a normalization of the redundancy val- 
ues of the networks for which randomly generated net- 
works are available can be performed as follows. For each 
of the nine networks, we first computed the standardized 
redundancy value for each of the 100 random networks to 
eliminate sampling bias (for a sample x±, x-x, ■ ■ ■ , x m with 
average fi and standard deviation tr, the standardized 
value of Xi is given by z *~^ ). Then, we calculated the 
standardized range (difference between maximum and 
minimum) of these 100 standardized redundancy values. 
Finally, we normalized original redundancy value by di- 
viding them by this standardized range. The resulting 
normalized values are shown in Table ITTT1 (for comparison 
purposes, the normalized redundancy values are scaled so 
that their summation is exactly the same as the summa- 
tion of original redundancy values). As can be seen, the 
ranks of both original and normalized values are almost 
the same (in the order (5), (1), (3), (11), (2), (4), (7), 
(6), (8) and (5), (1), (3), (11), (4), (2), (7), (6), (8), 
respectively) and the relative magnitudes of the values 
are similar whether one uses the normalized or original 
values, and thus all of our conclusions are valid in either 
case. Thus, in the rest of the paper, we use the original 
redundancy values with the understanding that all of our 
conclusions are valid for the normalized values as well. 

In spite of our somewhat limited set of experiments, 
our results do point to some interesting hypotheses, 
which we summarize below. 



A. Rnow can be computed quickly for large 
networks and is statistically significant 

As our simulations show, the new redundancy measure 
can be computed quickly for networks up to thousands 
of nodes; for example, typically NET-SYNTHESIS takes 
from a few seconds up to a minute for networks having 
up to 1000 nodes or edges. This is a desirable property 
of any redundancy measure so that it can be used by 
future researchers as biological and social networks grow 
in number and size. Moreover, the extremely low p- values 
suggests statistical significance of the new measure. 



B. Redundancy variations in biological networks 

We focus our attention to the variations of the redun- 
dancy values for the five transcriptional/signaling bio- 
logical networks in our dataset and make the following 
observations. 

a. Transcriptional vs. signaling networks Networks 
(1), (3) and (6) arc transcriptional networks with all 
having similar low redundancies (0.062, 0.068 and 0.06). 
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TABLE I. Network data with sources. If duplicated edges were present in the original network, they were removed in calculation 
of number of edges. 



Number Number Average 




of nodes of edges degree 


Brief Description and Reference 


(n) (m) (m/n) 





Biological Networks 



(1) 311 451 1.45 



E. coli transcriptional regulatory network constructed by Shen-Orr, 
Milo, Mangan and Alon in [24[ for direct regulatory interactions be- 
tween transcription factors and the genes or operons they regulate; see 
http : //www. nature . com/ng/journal/v31/nl/f ull/ng881 .html 
Mammalian network of signaling pathways and cellular machines in 

(2) 512 1047 2.04 the hippocampal CA1 neuron constructed by Ma'ayan et al. JH; see 

http : //www . sciencemag . org/ content/309/5737/1078 . abstract 

E. coli transcriptional regulatory network (updated version of the net- 

(3) 418 544 1.3 work constructed by Shen-Orr, Milo, Mangan and Alon in (HI); see 



http : //www. weizmann. ac . il/mcb/UriAlon/Papers/networkMotif s/ coli 1_ lint er_st . txt 

g„ 2 28 ^ ce ^ l ar 9 e granular lymphocyte (T-LGL) survival signaling network constructed by 

^ ' ' ' Zhang et al. [39j |: see [http : //www . pnas . org/content/ 105/42/16308 . abstract! 

"S. cerevisiae transcriptional regulatory network constructed by Milo et al. flol] 
(5) 690 1082 1.56 showing interactions between transcription factor proteins and genes; see 

http : //www . sciencemag . org/ cgi/ content/abstract/298/5594/824 

7Z. ~ on/in o iq C. eleaans metabolic network constructed bv Jeong et al. [411] and also used bv Duch 

(o) 651 2040 3.13 * A . \ Ar Jt\ 

v ' and Arenas m [42fl. 

An oriented version of an unweighted PPI network constructed from S. cerevisiae 

(7) 786 2453 3.12 interactions in the BioGRID database by Gitter, Klein-Seetharaman, Gupta and 

Bar- Joseph [il . 



Social Networks 



(8) 198 


2742 


13.84 


Network of Jazz musicians [44(. 


(9) 1133 


10903 


9.62 


List of edges of the network of e-mail interchanges between members of the Univer- 
sity Rovira i Virgili (Tarragona) 1451] . 


(10) 11240 


24316 


2.16 


Network of users of the Pretty-Good-Privacy algorithm for secure information in- 
terchange; edges connect users that trust each other [Hj]. 


(11) 1169 


1912 


1.63 


Enron email network; available from UC Berkeley Enron Email Analysis 
(http : / /bailando . sims . berkeley . edu/enron_email . html). 



On the other hand, network (2) is a signaling network 
and network (4) is also predominantly signaling, though 
it includes four transcriptional edges; these two mam- 
malian signal transduction networks have similar mid- 
range redundancies, namely 0.434 and 0.438, respec- 
tively. We hypothesize that in general transcriptional 
networks are less redundant than signaling networks. A 
straightforward supporting evidence for this is the higher 
average degree of signaling networks as compared to the 
transcriptional ones. Transcriptional networks have in- 
deed been reported to have a feed-forward structure with 
few feedback loops and relatively low cross-talk (47j . 
whereas [38j reports a large strongly connected compo- 
nent for their studied signaling networks (which makes it 



possible to reach almost any node from any input node). 

b. Role of currency metabolites in redundancy of 
metabolite networks Our data-source for the C. ele- 
gans metabolic network includes two types of nodes, the 
metabolites and reaction nodes, and the edges are di- 
rected either from those metabolites that are the reac- 
tants of a reaction to the reaction node, or from the 
reaction node to the products of the reaction. In this 
representation, redundant edges appear if both (one of) 
the reactant(s) and (one of) the product(s) of a reaction 
appear as reactants of a different reaction, or conversely, 
both (one of) the reactant(s) and (one of) the product(s) 
of a reaction appear as products of a different reaction. 
Because a reaction cannot go forward if one of its re- 
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TABLE II. (a) Topological redundancy and (b) monotonicity values. Higher values of R nC w (respectively, M) imply more 
redundancy (respectively, monotonicity). In general, a p-value below 10 -4 indicates statistical significance. N/A means not 
applicable; — indicates p- value could not be computed in reasonable time with the current implementation of NET-SYNTHESIS 
because of its extensive disk access for networks that are too large or dense. Note that the p-values depend not only on the 
average redundancies of the random networks but also on the higher order moments. 





(a) 




(b) 


Network 


Redundancy 
Rnew p- value average redundancy 
of random networks 


Monotonicity 
M 


Biological Networks 


(1) E. Coli transcriptional 


0.062 1.43 x l(T 2y 


0.188 


0.796 


(2) Mammalian signaling 


0.434 4.4 x 10" M 


0.576 


0.593 


(3) E. Coli transcriptional 


0.068 2.61 x 10~ y 


0.099 


0.862 


(4) T-LGL signaling 


0.438 1.15 x lO" 11 


0.350 


0.867 


(5) S. cerevisiae transcriptional 


0.060 9.34 x 10- 43 


0.228 


0.926 


(6) C. elegans metabolic 


0.669 2.2 x 10" 14Y 


0.790 


0.444 


(7) Oriented S. cerevisiae protein interactions 


0.481 3.68 x 10- 111 


0.593 


N/A 


Social Networks 


(8) Jazz musicians network 


0.897 1.06 x 10~ u " 


0.929 


N/A 


(9) Email network at University Rovira i Virgili 


0.840 




N/A 


(10) Secure information interchange user network 


0.486 




N/A 


(11) Enron email network 


0.352 2.14 x 10~ B8 


0.377 


N/A 



TABLE III. Normalization keeps relative magnitudes and ranks of values similar to that in the original. 

Networks 





(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 


(8) 


(11) 


Original Redundancy R nC w 


0.062 


0.434 


0.068 


0.438 


0.06 


0.669 


0.481 


0.897 


0.352 


Normalized Redundancy R nC w 


0.048 


0.364 


0.070 


0.319 


0.043 


0.708 


0.497 


1.112 


0.295 



actants is not present, the redundant edges are not bio- 
logically redundant and cannot be eliminated. Our result 
of a surprisingly high redundancy value for the metabolic 
network nevertheless indicates a high abundance of a pat- 
tern, which warrants further investigation. 

One possibility we considered is that one of the reac- 
tions is essentially a dimerization of a compound and its 
slightly modified variant. However, we found no strong 
support for this case. Another possibility is that metabo- 
lites that participate in a large number of reactions will 
have a higher chance to be the reactant or product of such 
"redundant" edges. There is a biological basis for this 
possibility in the existence of currency metabolites. Cur- 
rency metabolites (sometimes also referred to as carrier 
or current metabolites) are plentiful in normally function- 
ing cells and occur in widely different exchange processes. 
For example, ATP can be seen as the energy currency of 
the cell. Because of their wide participation in diverse 
reactions, currency metabolites tend to be the highest 



degree nodes of metabolic networks. There is some dis- 
cussion in the literature on how large the group of cur- 
rency metabolites is, but the consensus list includes H2O, 
ATP, ADP, NAD and its variants, NH4+, and P04 3 " 
(phosphate) [H,|49|. 

Our data source for the C. elegans metabolic network 
indicates the identity of the 10 highest in-degree nodes 
(as a group) and the 10 highest out-degree nodes (as a 
group). Out of the 13 distinct nodes in the aggregate 
of these two groups, 11 belong in the consensus list of 
currency metabolites, leaving out co-enzyme A and L- 
glutamate. We found that when we rank the nodes of the 
network by the number of redundant edges (as found by 
NET-SYNTHESIS) incident upon them and consider the 
top 17 nodes in this rank order, they include all the 13 
highest degree nodes in the original networks. Thus we 
can conclude that the topological redundancy of the C. 
elegans metabolic network is largely due to its inclusion 
of currency metabolites. 
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FIG. 9. Adding the edge colored light gray may increase the 
redundancy of the social network drastically (removed edges 
shown as dotted). 



C. Redundancy of Social vs. Biological Networks 

The results in Table [TTI seem to suggest that social net- 
works arc more redundant than biological networks. In 
fact, the two most redundant networks in the table are 
the two social networks (8) and (9) which have redun- 
dancies about twice than that of any biological networks 
considered, and the remaining two social networks have 
redundancies comparable to the highest redundancy of 
the biological networks. We hypothesize that in general 
this is the case. This hypothesis is perhaps not very sur- 
prising in the context of past research as explained below. 

The research work of Navlakha and Kingsford [5(| sug- 
gests that biological networks may grow and evolve quite 
differently than social networks. In particular, they show 
that models for biological networks may perform poorly 
for social networks and vice versa. It is conceivable that 
different models may give rise to different magnitudes of 
redundancy. 

Some previous research works (e.g., see [5ll-[53T]) as- 
certain that social networks tend to exhibit assortativity 
(i.e., highly connected nodes tend to be connected with 
other high degree nodes), whereas biological networks 
typically show dissortativity (i.e., high degree nodes tend 
to attach to low degree nodes). It is not difficult to see 
that such properties may lead to the difference in re- 
dundancies for the two types of networks; For example, 
in Fig. [9] an edge between two nodes of high degree re- 
sults in removal of a large number of edges. To check 
the general hypothesis of assortativity for our specific 
networks, we computed the assortativity coefficient for a 
network as defined in This coefficient is calculated 
in the following manner. First, we ignore the direction of 
edges obtaining an undirected graph G = (V, E) from the 
given directed graph. Then, the assortativity coefficient 
r is computed by the following formula: 



\W\ £{tt.n}gE d " d "~[2T^ Yl 



(d„+d,)] 2 



2WJ Ef^^IKP+fd^H^ £ { „,„ }eB (d„+d„)] 

where d u denotes the degree of a node u. It is khown 
that — 1 < r < 1, and more negative (respectively, more 
positive) values of r indicating more disassortativity (re- 
spectively, more assortativity) of the given network. As 



Table llVl shows. all biological networks are disassortativc, 
whereas all but one social network are assortative. 

Finally, social networks that are related to human be- 
havior are often expected to exhibit a high degree of 
transitivity (5345^1 ■ For example, the classical work of 
Leinhardt 5G] asserts that the structure of interpersonal 
relations in children's groups will progress in consistent 
fashion from less to more transitive organization as the 
children become older. Transitivity in this type of behav- 
ioral context translates to coherent type 1 feed-forward 
loops (i.e., feed- forward loops of the form A — > B , B — >• C 
and A — >• C) , each of which contains a redundant edge, 
and thus higher transitivity immediately implies higher 
redundancy in our context. To check how far this general 
hypothesis holds for our specific networks, we calculated 
the transitivity coefficient for our networks. The transi- 
tivity coefficient r of a directed network [57| is given by 
t^tt^ where jj,2 and /X3 are the number of ordered triplets 
of vertices that has two and three edges among them, re- 
spectively. We used an obvious algorithm to calculate 
this value; r could not be calculated within reasonable 
time for the social network (10) in Table U because of its 
large number of nodes and edges. As shown in Table HVl 
all the biological networks have small transitivity coeffi- 
cients, and among the social networks, network (8) has 
a value of r that is significantly more than any of the 
biological networks. 



D. Redundancy, minimality and orienting PPI 
networks 



Protein interaction networks represent physical inter- 
actions among proteins. While many protein interactions 
have an orientation, the current maps of protein-protein 
interaction (PPI) networks are often unoriented (undi- 
rected) in part due to the limitations of the current ex- 

Thus, there is an 



pcrimental technologies such as 
obvious interest in trying to orient these networks by, 
say, combining causal information at the cellular level. 
Unfortunately, most versions of the orientation problem 
is theoretically N P-hard [5^, [6(| , and thus heuristics for 
such orientations may either not lead to all pathways of 
interest or lead to extra spurious pathways that are not 
supported [H,[6(j. 

Our calculation of redundancy values and minimal net- 
works provides a way to gain insight into a predicted 
orientation of a PPI network and to determine whether 
the predicted oriented network has a level of redundancy 
similar to those in known biological networks. Obviously, 
the lower the value of R ne w is, the more compact is the 
construction of the oriented network. However, one must 
also ensure that the minimal network also contains the 
right kind of pathways, e.g., paths in the "gold stan- 
dard". To this effect, we describe the results of this ap- 
proach via the NET- SYNTHESIS software on an oriented 
PPI network from j43|. 
We first briefly review the method by which the ori- 
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TABLE IV. Values of the assortativity coefficient r and the transitivity coefficient r . Negative values of r indicate disassortativity 
whereas positive values of r indicate assortativity. 



Network Index 



Biological 


Social 


(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 


(8) 


(9) (10) 


(11) 


r = -0.149 


-0.106 


-0.204 


-0.089 - 


-0.398 


-0.060 


-0.1377 


+0.02 


+0.07 +0.239 


-0.44 


r = 0.037 


0.010 


0.007 


0.043 


0.005 


0.047 


0.017 


0.255 


0.058 


0.013 



cntcd PPI network used by us was generated. The start- 
ing point for the network consisted of all physical in- 
teractions among yeast proteins from version 2.0.51 of 
BioGRID [6l| . Edge weights were assigned based on the 
type and quantity of experimental support for each in- 
teraction, and low-weight edges were removed from the 
network. The network was oriented so as to maximize the 
weighted number of length-bounded paths between pre- 
determined sources and targets, which were taken from 
yeast MAPK signaling pathways. The final set of 2435 
edges included all oriented edges that belonged to any 
path with 5 or fewer edges between a source and target 
and edge weights were dropped for subsequent analysis. 
The sources, targets, PPI filtering and orientation algo- 
rithm are described more fully in [43l ] . 

Now we discuss the paths in the non-redundant 
network (after reduction via NET-SYNTHESIS) that 
are present in the gold standard. Several of the short 
source-target paths in this network correspond to 
known yeast MAPK signaling pathways, specifically the 
phcromone response and filamentous growth pathways 
( |www . genome . jp/kegg/pathway/sce/sce04011 .html). 
Fig. [10] depicts the union of all linear paths in the 
non-redundant network that have multiple consecutive 
edges that match a gold standard path. The paths 
that matched a gold standard path are highly similar, 
and the common gold standard edges in these hits are 
Ste7^Fus3, Fus3-^Digl and Digl->-Stel2. 



the correlation between topological properties and ro- 
bustness of networks are also consistent with the neg- 
ative correlation that we obtained. The authors of that 
paper considered a weighted network model in which the 
state of each node is a real number in the range { — 1, 1} 
and the positive and negative weights of the connections 
represent the strengths of the excitory or inhibitory con- 
nections, respectively. A negative (respectively, positive) 
feedback loop is then defined to be a simple cycle with 
odd (respectively, even) number of negative weights in 
the cycle, and the degree of robustness of a network is 
then defined by selecting a group of nodes randomly, per- 
turbing the values of their states, and measuring the ex- 
tent of change of states of various nodes in the network by 
computing the ratio of state values converging to a same 
final state to which the original initial state converged 
(biologically, this concept of robustness means the ex- 
tent of maintaining the original stable state against given 
perturbations). Based on extensive simulation results, 
the authors concluded that networks with fewer negative 
feedback loops are likely to be more robust in their sense. 
More robustness with respect to perturbations suggests 
less influence of one node on another, and consequently 
fewer alternate pathways of the same nature from a node 
to another, indicating less redundancy values, whereas 
fewer negative feedback loops correspond to higher de- 
gree of monotonicity. Thus, their observation is, at least 
on an intuitive level, consistent with our finding. 



E. Correlation between redundancy and network 
dynamics 

The Pearson correlation coefficient between M and 
R ncw is about —0.8 with a p-value of 0.0066. Thus, 
monotonicity is negatively correlated to redundancy (i.e., 
higher values of redundancy are expected to lead to lower 
values of monotonicity and vice versa) . 

As explained before, monotonicity is known to be nega- 
tively correlated to negative feedback loops [III HI] • Neg- 
ative feedback loops also tend to increase the redundancy 
of signal transduction networks; see Fig. [TTJ for an illus- 
tration. Indeed, strongly connected components with at 
least one negative feedback loop were called a multiple 
parity components in (20| and played a significant role in 
redundancy calculations. 

Furthermore, recent results of Kwon and Cho [63j on 



F. Significance of a minimal network 

It is certainly an interesting question to ask if a topo- 
logically minimal network has similar dynamical or func- 
tional properties as the original network. Note that the 
question does not make sense for the four (static) social 
networks (networks (8), (9), (10) and (11) in Tabled, 
since the individual nodes in these networks usually do 
not have well-defined functions or dynamics, and one of 
their most interesting properties, namely connectivity, is 
preserved in the minimal network. The redundancy issue 
of the metabolic network (network (6) of Table UJ) is ex- 
plained separately in detail in Section IVII B bl There 
is no associated dynamics with the oriented PPI network 
(network (7) of Table UJ). Thus, this question only ap- 
plies for the first five biological networks (networks (1), 
(2), (3), (4) and (5)) in Table UJ A dynamic descrip- 
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FIG. 10. (color online) Paths in the non-redundant oriented 
PPI network that match known yeast signaling pathways. 
Solid edges are present in the gold standard and dashed edges 
represent novel predictions. 



tion/model of these networks would characterize dynamic 
behaviors, such as stability and response to external in- 
puts. When the network has designated outputs or read- 
outs, such as gene expression rates in transcriptional net- 
works, it may be of interest to characterize the behavior 
of these outputs as a function of the inputs. 

A topologically minimal network has the same input- 
output connectivity (reachability) as the original and 
thus the excitory or inhibitory influence between each 
input-output pair is preserved. It is minimal in the "in- 
formation theoretic" sense in that any network with the 
same output behavior must be of at least this size. A 
correlation of the redundancy measure with the mono- 
tonicity of dynamics is explored in Section IVH El Will 
a topologically minimal network also have the same out- 




FIG. 11. The network shown has no negative feedback loops 
and no redundant edges. However, if we replace the gray acti- 
vation edge ^3 — > V2 to an inhibition edge 113 H «2, a negative 
feedback loop is created and this makes all the remaining in- 
hibitory edges in the network redundant (e.g., the edge v± H «4 
is redundant because of the path v± — > V2 — > U3 H V2 — 5- ^4). 



put behavior as the original one for the same input? In 
general, there is no such guarantee since the dynamics 
depend on what type of functions ("gate") are used to 
combine incoming connections to nodes and the "time 
delay" in the signal propagation, both of which are omit- 
ted in the graph-theoretic representation of regulatory 
and signal-transduction networks such as (1)— (5) in Ta- 
ble I. For example, consider the two networks shown in 
Fig. [12] in which network (b) has a redundant connec- 
tion A — > C. The functions of these two circuits could 
be different, however, depending on the "gate" function 
used to combine the inputs B — > C and A —> C in net- 
work (b). Due to the shared A — » B — > C connectiv- 
ity in the two networks, in both cases node C will be 
activated if A is continuously supplied. However, while 
network (a) merely implements a delay between C and 
A, the coherent typc-1 feedforward loop indicated in (b) 
is what [HI calls a "sign-sensitive delay element" that 
filters spikes in signals (low-pass filter) provided that an 
"AND" gate combines the inputs to node C; one exam- 
ple of such a circuit is that of the Arabinose system in 
E.coli [65| . In summary, deleting edges may result in 
functionalities that are not exactly the same. 



(a) 



(b) 



FIG. 12. Equivalence of dynamics depends on node functions. 

However, despite the fact that a minimal network may 
not preserve all dynamic properties of the original one, 
a significant application of finding minimal networks lies 
1 
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precisely in allowing one to identify redundant connec- 
tions (edges). In this manner, one may focus on investi- 
gating the functionalities of these redundant edges, e.g., 
identifying the manner in which their effect is cumulated 
with those of the other regulators of their target nodes 
could be a key step toward understanding the behavior 
of the entire network. 

Thus, the tools developed here are of general interest 
as they not only provide a quantified measure of over- 
all redundancy of the network, but also also allow their 
identification of redundancies and hence help direct fu- 
ture research toward the understanding of the functional 
significance of the added links. 



VIII. AVAILABILITY OF DATA AND 
SOFTWARE 

Most of the data for the original network as well as 
those for the random networks used in the calculation 
of p-values for R new are available from our web- 
site |www . cs . uic . edu/~d asgupta/network-data/ 
The NET-SYNTHESIS software for calculat- 
ing redundancies is available from our website 
|www. cs .uic . edu/~dasgupta/network-synthesis/| 
MATLAB codes for computing monotonic- 
ity values are available from our website 
www. math. rutgers . edu/~sontag/desz_README.litml 



IX. CONCLUSIONS 

In this paper we have defined a new combinatorial mea- 
sure of redundancy of biological and social networks, and 
have illustrated its efficient computation on several small 
and large networks. We also noted some interesting hy- 
potheses that one could draw from these results such as: 



• Transcriptional networks are likely to be less redun- 
dant than signaling networks. 

• The topological redundancy of the C. elegans 
metabolic network is largely due to its inclusion 
of currency metabolites. 

• Social networks are prone to be more redundant 
than biological networks. 

• Our calculation of redundancy values and minimal 
networks provides a way to gain insight into a pre- 
dicted orientation of a protein-protein-interaction 
(PPI) network and determine whether the predicted 
oriented network has a level of redundancy similar 
to those in known biological networks. 

• Our topology-based redundancy measure for bio- 
logical signaling networks is statistically correlated 
with some measure of the dynamics of the network, 
namely higher redundancy is correlated to lower 
monotonicity and vice versa. 

We believe that our fast and accurate computation of re- 
dundancy measure will help future researchers to further 
fine tune the measure and test it on a large-scale basis. 
An interesting question that has been partially addressed 
in the past literature but deserves further investigation is 
to determine the reasons of redundancy of various kinds 
of biological networks. 
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